834 research outputs found

    Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation

    Full text link
    Myriad of graph-based algorithms in machine learning and data mining require parsing relational data iteratively. These algorithms are implemented in a large-scale distributed environment in order to scale to massive data sets. To accelerate these large-scale graph-based iterative computations, we propose delta-based accumulative iterative computation (DAIC). Different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, DAIC updates the result by accumulating the "changes" between iterations. By DAIC, we can process only the "changes" to avoid the negligible updates. Furthermore, we can perform DAIC asynchronously to bypass the high-cost synchronous barriers in heterogeneous distributed environments. Based on the DAIC model, we design and implement an asynchronous graph processing framework, Maiter. We evaluate Maiter on local cluster as well as on Amazon EC2 Cloud. The results show that Maiter achieves as much as 60x speedup over Hadoop and outperforms other state-of-the-art frameworks.Comment: ScienceCloud 2012, TKDE 201

    CSD: Discriminance with Conic Section for Improving Reverse k Nearest Neighbors Queries

    Full text link
    The reverse kk nearest neighbor (RkkNN) query finds all points that have the query point as one of their kk nearest neighbors (kkNN), where the kkNN query finds the kk closest points to its query point. Based on the characteristics of conic section, we propose a discriminance, named CSD (Conic Section Discriminance), to determine points whether belong to the RkkNN set without issuing any queries with non-constant computational complexity. By using CSD, we also implement an efficient RkkNN algorithm CSD-RkkNN with a computational complexity at O(k1.5⋅log k)O(k^{1.5}\cdot log\,k). The comparative experiments are conducted between CSD-RkkNN and other two state-of-the-art RkNN algorithms, SLICE and VR-RkkNN. The experimental results indicate that the efficiency of CSD-RkkNN is significantly higher than its competitors

    The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems

    Full text link
    Now we live in an era of big data, and big data applications are becoming more and more pervasive. How to benchmark data center computer systems running big data applications (in short big data systems) is a hot topic. In this paper, we focus on measuring the performance impacts of diverse applications and scalable volumes of data sets on big data systems. For four typical data analysis applications---an important class of big data applications, we find two major results through experiments: first, the data scale has a significant impact on the performance of big data systems, so we must provide scalable volumes of data sets in big data benchmarks. Second, for the four applications, even all of them use the simple algorithms, the performance trends are different with increasing data scales, and hence we must consider not only variety of data sets but also variety of applications in benchmarking big data systems.Comment: 16 pages, 3 figure

    A core stateless bandwidth broker architecture for scalable support of guaranteed services

    Full text link

    Making Networks Robust to Component Failures

    Get PDF
    In this thesis, we consider instances of component failure in the Internet and in networked cyber-physical systems, such as the communication network used by the modern electric power grid (termed the smart grid). We design algorithms that make these networks more robust to various component failures, including failed routers, failures of links connecting routers, and failed sensors. This thesis divides into three parts: recovery from malicious or misconfigured nodes injecting false information into a distributed system (e.g., the Internet), placing smart grid sensors to provide measurement error detection, and fast recovery from link failures in a smart grid communication network. First, we consider the problem of malicious or misconfigured nodes that inject and spread incorrect state throughout a distributed system. Such false state can degrade the performance of a distributed system or render it unusable. For example, in the case of network routing algorithms, false state corresponding to a node incorrectly declaring a cost of 0 to all destinations (maliciously or due to misconfiguration) can quickly spread through the network. This causes other nodes to (incorrectly) route via the misconfigured node, resulting in suboptimal routing and network congestion. We propose three algorithms for efficient recovery in such scenarios and evaluate their efficacy. The last two parts of this thesis consider robustness in the context of the electric power grid. We study the use and placement of a sensor, called a Phasor Measurement Unit (PMU), currently being deployed in electric power grids worldwide. PMUs provide voltage and current measurements at a sampling rate orders of magnitude higher than the status quo. As a result, PMUs can both drastically improve existing power grid operations and enable an entirely new set of applications, such as the reliable integration of renewable energy resources. However, PMU applications require correct (addressed in thesis part 2) and timely(covered in thesis part 3) PMU data. Without these guarantees, smart grid operators and applications may make incorrect decisions and take corresponding (incorrect) actions. The second part of this thesis addresses PMU measurement errors, which have been observed in practice. We formulate a set of PMU placement problems that aim to satisfy two constraints: place PMUs near each other to allow for measurement error detection and use the minimal number of PMUs to infer the state of the maximum number of system buses and transmission lines. For each PMU placement problem, we prove it is NP-Complete, propose a simple greedy approximation algorithm, and evaluate our greedy solutions. In the last part of this thesis, we design algorithms for fast recovery from link failures in a smart grid communication network. We propose, design, and evaluate solutions to all three aspects of link failure recovery: (a) link failure detection, (b) algorithms for pre-computing backup multicast trees, and (c) fast backup tree installation. To address (a), we design link-failure detection and reporting mechanisms that use OpenFlow to detect link failures when and where they occur inside the network. OpenFlow is an open source framework that cleanly separates the control and data planes for use in network management and control. For part (b), we formulate a new problem, Multicast Recycling, that pre-computes backup multicast trees that aim to minimize control plane signaling overhead. We prove Multicast Recycling is at least NP-hard and present a corresponding approximation algorithm. Lastly, two control plane algorithms are proposed that signal data plane switches to install pre-computed backup trees. An optimized version of each installation algorithm is designed that finds a near minimum set of forwarding rules by sharing forwarding rules across multicast groups. This optimization reduces backup tree install time and associated control state. We implement these algorithms using the POX open-source OpenFlow controller and evaluate them using the Mininet emulator, quantifying control plane signaling and installation time
    • …
    corecore